Prediction of 30-day in-hospital mortality in older UGIB patients using a simplified risk score and comparison with AIMS65 score

Background Upper gastrointestinal bleeding (UGIB) in older patients is associated with substantial in-hospital morbidity and mortality. This study aimed to develop and validate a simplified risk score for predicting 30-day in-hospital mortality in this population. Methods A retrospective analysis was conducted on data from 1899 UGIB patients aged ≥ 65 years admitted to a single medical center between January 2010 and December 2019. An additional cohort of 330 patients admitted from January 2020 to October 2021 was used for external validation. Variable selection was performed using five distinct methods, and models were generated using generalized linear models, random forest, support vector machine, and k-nearest neighbors approaches. The developed score, “ABCAP,” incorporated Albumin < 30 g/L, Blood Urea Nitrogen (BUN) > 7.5 mmol/L, Cancer presence, Altered mental status, and Pulse rate > 100/min, each assigned a score of 1. Internal and external validation procedures compared the ABCAP score with the AIMS65 score. Results In internal validation, the ABCAP score demonstrated robust predictive capability with an area under the curve (AUC) of 0.878 (95% CI: 0.824–0.932), which was significantly better than the AIMS65 score (AUC: 0.827, 95% CI: 0.751–0.904), as revealed by the DeLong test (p = 0.048). External validation of the ABCAP score resulted in an AUC of 0.799 (95% CI: 0.709–0.889), while the AIMS65 score yielded an AUC of 0.743 (95% CI: 0.647–0.838), with no significant difference between the two scores based on the DeLong test (p = 0.16). However, the ABCAP score at the 3–5 score level demonstrated superior performance in identifying high-risk patients compared to the AIMS65 score. This score exhibited consistent predictive accuracy across variceal and non-variceal UGIB subgroups. Conclusions The ABCAP score incorporates easily obtained clinical variables and demonstrates promising predictive ability for 30-day in-hospital mortality in older UGIB patients. It allows effective mortality risk stratification and showed slightly better performance than the AIMS65 score. Further cohort validation is required to confirm generalizability. Supplementary Information The online version contains supplementary material available at 10.1186/s12877-024-04971-w.


Background
Upper gastrointestinal bleeding (UGIB) represents a significant clinical challenge, marked by its high prevalence and substantial impact on public health systems globally.In the United Kingdom, the annual incidence of UGIB ranges from 103 to 172 cases per 100,000 adults, accompanied by a mortality rate of 8-14% [1].The situation is similarly dire in the United States, where UGIB accounts for over 800,000 emergency department visits annually, with about half of these cases necessitating hospitalization [2].In China, the UGIB-specific death rate is estimated to be between 4-14% [4], highlighting the universal burden of this condition.
Evidence suggests that effective prediction tools can significantly impact patient outcomes by facilitating early intervention and appropriate care management [8].Tools such as the Glasgow-Blatchford score [5], the Rockall score [3], and the AIMS65 score [6] have been pivotal in predicting in-hospital mortality and guiding clinical management.These instruments aid in patient stratification, potentially diminishing hospital stays and optimizing resource use [9].Recent studies highlight that urgent endoscopic procedures guided by high AIMS65 scores may contribute to reduced hospitalization periods for patients with nonvariceal upper gastrointestinal bleeding [10].
However, the effectiveness of current prediction tools for managing conditions in the older adult population is marked by uncertainty.This group's diverse physiological reserves and comorbid conditions challenge the applicability of generalized prediction models.Research into predictive tools for various diseases underscores this issue; for instance, existing cardiovascular risk models [11] and scores for acute respiratory infections [12] have shown limitations when applied to older adults, signaling the urgent need for adaptations that consider age-specific factors.This highlights the essential demand for developing more specialized models tailored to the unique needs of these populations.Specifically for older UGIB patients, evaluating whether existing prediction tools remain effective and determining the necessity for a tailored tool warrant further investigation.
As the global population ages rapidly, it becomes imperative to focus on older adults and reevaluate existing scoring systems, especially in contexts like China's, where demographic shifts are particularly pronounced [7].This study aims to develop a robust predictive model for estimating the 30-day in-hospital all-cause mortality among older patients with UGIB prior to endoscopic evaluation.We also plan to compare the predictive performance of this new score against the widely utilized AIMS65 score.Through these research endeavors, we aspire to refine risk stratification techniques, bolster clinical decision-making, and ultimately improve the management and outcomes of older patients with UGIB.

Participants
The source of participants for this study is the Older Diseases Dataset, a well-established and continuously updated research dataset comprising individuals aged over 60 years.The dataset is derived from the Electrical Health Record of the First Medical Center of the Chinese People's Liberation Army General Hospital (PLAGH).The development dataset encompasses patients admitted between January 2010 and December 2019, as per the available version before 2022.Furthermore, an external validation dataset was employed to assess the predictive score's performance beyond its development dataset.This validation dataset comprises patients admitted between January 2020 and October 2021.
Inclusion Criteria: • Age at admission greater than 65 years.
• Admission diagnosis of UGIB (blood loss from a gastrointestinal source above the ligament of Treitz), including diagnoses documented by physicians, admission records, and corresponding International Classification of Diseases 10th codes (ICD-10).• Typical symptoms described in patient complaints and medical history, such as "hematemesis (vomiting of fresh blood), " "coffee-ground" emesis (vomiting of dark altered blood), and/or melena.• Identification by a gastroenterologist.
• For patients with multiple admissions, only data from the earliest hospitalization were considered.
• Nonbleeding periods, such as old bleeding episodes or bleeding history.• Cases with a significant (> 50%) amount of missing laboratory results.

Data collection ICD-10 codes for UGIB
The ICD-10 codes of UGIB: see the supplementary document.

Vital signs, checkup, and mental status
Vital signs, including systolic blood pressure (SBP, mmHg), diastolic blood pressure (DBP, mmHg) and pulse (beats per minute) were recorded.Body mass index (BMI) was calculated using the following formula: BMI = weight (kg) / height 2 (m 2 ).Altered mental status was defined as a Glasgow Coma Scale score of less than 14 or a physician-charted designation of "disoriented, " "lethargy, " "stupor, " or "coma."

Charlson comorbidity index and comorbidities
The collection of comorbidities for the Charlson Comorbidity Index (CCI) involved the utilization of ICD-10 codes, with subsequent calculation of the CCI score for each patient by summing the assigned weights of the respective comorbidities (as detailed in the supplementary document).CCI serves as an extensively utilized tool for prognosticating 10-year survival rates among patients grappling with multiple comorbid conditions [13,14].This index attributes specific weights to diverse comorbidities in accordance with their individual impact on prognosis.In our study, the enumeration of CCI components was instrumental in depicting the intricate landscape of health challenges faced by the older population.The spectrum of collected comorbidities encompassed an array of conditions, including coronary heart disease (CAD), congestive heart failure (CHF), peripheral vascular disease (PAD), cerebrovascular disease (CVD), chronic obstructive pulmonary disease (COPD), moderate to severe kidney disease (Kidney Diseases), and liver disease (Liver Diseases).Additionally, cancers, both metastatic and nonmetastatic, as well as other conditions featured in the CCI were comprehensively incorporated.Furthermore, common geriatric comorbidities such as hypertension (HTN) and atrial fibrillation (AF) were meticulously recorded in the dataset, contributing to the comprehensive portrayal of the patients' health status.

Endoscope and outcome
Endoscopy records during hospitalization were collected.The primary outcome of interest was defined as any death occurring within 30 days of hospitalization with UGIB.Additionally, we also collected data on the length of hospital stay (days, LOS).

Quality control
Two independently trained investigators analyzed and collected data from electronic medical records.In case of conflicts, higher-level personnel make the final determination.

Statistical methods
All statistical analyses were conducted using R (R Version:4.2.3.R Core Team (2023).R: A language and environment for statistical computing.R Foundation for Statistical Computing, Vienna, Austria.URL https:// www.R-project.org/).Normally distributed variables are presented as the mean ± standard deviation, while nonnormally distributed variables are described as the median (interquartile range).Group comparisons for normally distributed variables employed t tests, whereas the Kruskal-Wallis test was applied for nonnormally distributed variables.Categorical variables were compared using the chi-square test or Fisher's exact test.To assess the association of each variable with the outcome, univariable regression was used.

Training and validation dataset
The Development dataset was further divided into a 70% training subset, used for variable selection and model training, and a 30% internal validation subset.The external validation dataset was employed for independent model evaluation and to ensure the robustness of the model performance assessment.The random partitioning of data into these subsets was carried out utilizing the 'createDataPartition' function available within the 'caret' package.

Missing value handling
In our study, the missing values, mainly attributed to the lack of testing within a specific time window, can be classified as "Missing Completely at Random" (MCAR).In MCAR scenarios, the absence of data is assumed to be unbiased and unlikely to systematically affect the outcomes, simplifying the process of imputation and analysis.To address these missing values within the development dataset, we employed appropriate procedures.
To ensure the integrity and reliability of the results, variables with missing data exceeding 20% were excluded from the analysis.Additionally, individual cases with missing values surpassing 50% in laboratory indicators were excluded during the data screening process.For instances where missing values were less than 20%, we utilized the 'missForest' function from the 'missForest' package in R for imputation.To enhance the reliability of the imputed data, we repeated the imputation process five times, and a statistical test was conducted to compare the imputed data with the original dataset.This comparison confirmed the absence of significant discrepancies between the imputed data and the original dataset.Detailed information can be found in the supplementary document.

Variables selection methods
To enhance the clinical applicability and interpretability of the predictive model, continuous variables were transformed into categorical variables using both general standard cutoff values and specific cutoff values employed in the First Medical Center of PLAGH.To identify the most relevant predictors and reduce dimensionality, we employed five variable selection methods: Stepwise by Akaike Information Criterion ('StepAIC' , 'MASS' package), Least Absolute Shrinkage and Selection Operator ('LASSO' , 'glmnet' package), Elastic net ('ENT' , 'glmnet' package), Best subset ('BestSub' , 'leaps' package), and Recursive Feature Elimination ('RFE' , 'caret' package).All selection methods were applied to the five training iterations, and the resulting variable selection outcomes from each method were combined.In cases where the number of selected variables exceeded 10, we included the top 10 variables that appeared most frequently in the selection results.

Model training methods and evaluation in internal and external evaluation
In our training dataset, we applied four distinct model training techniques using the 'train' function from the 'caret' package.These methods encompass Generalized Linear Models (GLM), k-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Random Forest (RF).Each was thoughtfully chosen to harness its specific advantages in capturing the complex relationships between predictor variables and the outcome variable.
To ensure robust model assessment and address the inherent limitations of our sample size, we employed a Repeated k-fold Cross-Validation approach, utilizing the 'trainControl' function from the 'caret' package.This cross-validation method was executed five times, each time with 20 folds, enabling us to obtain more stable and reliable estimates of our models' performance while mitigating the risk of overfitting.
To train the models, we performed the training process on the five training datasets using the variable selection subsets obtained from the feature selection process.Subsequently, the trained models were evaluated on the corresponding five internal validation datasets, enabling a comprehensive assessment of their performance across different data partitions.During the evaluation process, various performance metrics, including specificity, sensitivity, accuracy, F1 score, and area under the curve (AUC), were calculated for each iteration on the internal validation.To provide a more reliable estimate of the models' overall performance, the mean value of each performance metric across the five internal validation iterations was calculated.
Once the optimal model was identified and the scoring system was established, we employed AUC as the metric to validate its performance.This validation encompassed a comparison of the model's performance with the established AIMS65 score, across both internal and external validation phases.

Characteristics of participants in the development dataset
A total of 1899 distinct patients diagnosed with UGIB were included in the development dataset, and subsequently divided into training and internal validation subsets.To ensure rigor, the inclusion and exclusion criteria detailed in the Methods section were meticulously applied.Additionally, an external validation dataset consisting of 330 patients was defined, maintaining alignment with the established criteria.A flow diagram illustrating the participant selection process is presented in Fig. 1.The cohort comprised patients with a median age of 72 years, ranging from 65 to 102 years.During the 30-day follow-up period, a total of 97 patients experienced mortality associated with the condition, with an additional 21 patients passing away after the 30-day period.
Within the development dataset, a comparative analysis was conducted to investigate potential differences between patients who died within 30 days and those who survived.Demographic characteristics, comorbidities, and laboratory results were compared and summarized in Table 1.
In Table 1, it is noteworthy that the "Alive" group exhibits a notably higher ratio of in-hospital endoscopy, variceal cases, and alcohol use compared to the "Death" group, signifying a statistically significant association.Conversely, the "Death" group demonstrated an elevated mean age, increased CCI value, and a greater frequency of altered mental status.Additionally, the "Death" group shows a comparatively shorter LOS compared to the "Alive" group.

Variables subset selection
Following the completion of imputation and dataset splitting, a comprehensive variable selection process was initiated, employing five distinct methods on the entire training dataset.The outcomes of these selection methods exhibited varying patterns, as shown in Table 2.A total of 15 variables were selected by five methods.The StepAIC, RFE and BestSub methods identified more than 10 different significant features, while the 10 most frequently occurring features are included in Table 2.
LASSO and ENT have one relatively fixed result.INR emerged as a significant variable selected by four of the methods, while HF and Variceal were each chosen by three methods.Interestingly, SBP was exclusively chosen by two specific methods.Variables such as AGE, eGFR, HGB, ICH, Liver Diseases, and Peptic Ulcer were each selected by a single method." Based on the consistent selections of important variables across the all methods, including "Albumin", "BUN", "Cancer", "Altered Mental" and "Pulse", we can create another subset called "ABCAP" that includes these five variables.

Performance of models combined with variable subsets in internal validation
The evaluations of various training methods combined with different feature selection results in internal validation are presented in Table 3.For instance, the combination of KNN + StepAIC means using the KNN training prediction model with the variable subset selected by Ste-pAIC.All combinations consistently achieved accuracy, sensitivity, and F1 score levels slightly above 0.9.Combinations involving RF, KNN, and SVM models exhibited issues with correctly identifying negative instances, resulting in decreased overall AUC values, especially for SVM.The GLM models consistently outperformed the other methods.Among them, GLM + BestSub showed the best performance of all and achieved the highest AUC (0.89,95% CI: 0.831, 0.949), with other GLM combinations also showing promising results.The AUC for GLM + ABCAP was 0.879 (95% CI: 0.818, 0.939), slightly below the AUC of 0.888 for GLM + BestSub, which includes only five key variables.
Figure 2 displays the receiver operating characteristic (ROC) curves for each combination with the highest AUC, including the GLM + ABCAP combination.While GLM + BestSub emerged as the top performer in terms of AUC, it's essential to consider the balance between model complexity and performance.Notably, the GLM + ABCAP combination offers a more parsimonious model by utilizing only five variables, compared to the ten variables used by GLM + BestSub.This suggests that the GLM + ABCAP combination may be a more appropriate choice when seeking a simpler and more interpretable model, without a significant sacrifice in predictive performance.

General linear models equation and ABCAP score
After carefully considering various factors, we chose the GLM + ABCAP combination as our final model.Consequently, the resulting GLM equation, trained using the training dataset, takes the form: In the GLM equation, each variable is represented by a coefficient value, indicating its contribution to the logodds of the outcome (death).A positive coefficient suggests a positive association with the outcome, while a negative coefficient suggests a negative association.By plugging in the respective values of the variables, the The equation and Table 4 reveal important associations between the variables and the outcome.The intercept term of -3.71 represents the log-odds of the outcome when all predictor variables are at their reference levels or baseline values.Among the five variables integrated into the model, Cancer, Altered Mental, Pulse (> 100/min), and BUN (> 7.5 mmol/L) exhibit positive coefficients, indicating an elevated risk of the outcome.Conversely, Albumin (≥ 30 g/L) has a negative coefficient, suggesting a protective effect on prognosis, while Albumin (< 30 g/L) has the opposite effect.The odds ratios further quantify these associations.Odds ratios exceeding 1 signify an increased risk, whereas odds ratios below 1 indicate a decreased risk.
The ABCAP score, derived from the GLM equation mentioned earlier, simplifies calculations by assigning one point to each of the five predictors.In parallel, the AIMS65 score-a validated scoring system for predicting in-hospital mortality in patients with UGIB-also incorporates five variables, with each variable assigned a score of 1 point [6].
Table 5 compares variables in both scoring systems used to assess severity and predict in-hospital mortality for patients with UGIB.The ABCAP score includes Cancer, Altered Mental Status, Pulse > 100/min, Albumin < 30 g/L, and BUN > 7.5mmol/L.In contrast, the

Comparison of ABCAP score with AIMS65 score in internal and external validation
Table 6 presents the characteristics of the external validation dataset.All variables included in both scoring systems exhibited significant differences between the groups of patients who survived and those who did not.Figure 3 shows the ROC curves of the original equation, AIMS65 and ABCAP score in both internal and external validations.In the internal validation, the original equation achieves an AUC of 0.886 (95% CI: 0.832-0.940),the ABCAP score attained an AUC of 0.878 (95% CI: 0.824-0.932),and the AIMS65 score demonstrated an AUC of 0.827 (95% CI: 0.751-0.904),thep-value of DeLong test between ABCAP and AISM65 score is 0.048.In the external validation, both the ABCAP and AIMS65 scores experienced a slight decrease in their    across these levels, with the "0" and "1" point levels having the highest patient counts in both scoring systems.Upon analyzing cumulative counts, a clear pattern emerges wherein low scores (ranging from 0 to 2 points) and high scores (ranging from 3 to 5 points) exhibit a notable similarity for both scoring methods.However, when considering cumulative death counts and the associated ratios, we observe distinct patterns.In instances of low scores, the ABCAP system demonstrates no significant difference from the AIMS65 system.In contrast, for cases involving high scores, the ABCAP scores exhibit higher mortality ratios compared to the AIMS65 system.Of particular interest is the "Same score patients counts" column, which reveals the count of patients who received identical scores in both the ABCAP and AIMS65 systems.Remarkably, the proportion of patients displaying the same score was consistently less than 50% across all score levels in the AIMS65 system.
Table 8 provides the cumulative mean values of the statistical metrics for each score level and the corresponding mortality rates obtained from the five imputation datasets using the ABCAP score.As score levels increase, an evident trend emerges.Sensitivity experiences a decline, while specificity follows an opposing trajectory.The Positive predictive value (PPV) exhibits an ascending pattern, whereas the negative predictive value (NPV) displays an inverse association with higher score levels.
In terms of mortality rates, score levels of ≥ 1 and ≥ 2 were associated with mortality rates of 9.27% and 20.58%, respectively.The positive likelihood ratio (PLR) values for these score levels are 1.898 and 4.815.For score level ≥ 3, the mortality rate further increased to 36.69%,  accompanied by a PLR of 12.232.Upon reaching score levels ≥ 4 and 5, mortality rates escalated significantly to 67.73% and 79.67%.The corresponding PLR values also experience substantial increments, reaching 40.296 and 74.309, respectively.Considering the distribution of ABCAP scores and the observed metrics at each score level, we can classify scores ranging from 0 to 2 as indicative of low risk, a score of 3 as signifying moderate risk, and scores ranging from 4 to 5 as indicative of high risk.

ABCAP score performance in the variceal and no-variceal groups
The distribution of the ABCAP score at different score levels and its performance within the variceal (622 patients) and nonvariceal (1277 patients) groups within the Development Dataset are visualized in Fig. 4. Notably, the number of patients in the nonvariceal group was nearly double that in the variceal group.The distribution of ABCAP scores among different levels within each group correlates with the proportion of individuals in that group.
To assess the predictive performance of the ABCAP score, we utilized the AUC values for both the variceal and nonvariceal groups.In the variceal group, the calculated AUC was 0.881 (95% CI: 0.805-0.958),signifying a strong level of predictive accuracy.Similarly, within the nonvariceal group, the AUC was measured at 0.873 (95% CI: 0.834-0.912).A statistical analysis with a P value of 0.853 indicates no significant difference in the performances of the ABCAP score between these two groups.

Discussion
Despite significant advancements in the prevention and treatment of UGIB, the prognosis for older patients remains a challenge during hospitalization.Interestingly, among the 1899 patients included in our development dataset, those who did not survive exhibited a lower rate of in-hospital endoscopy compared to the surviving group, despite the majority of patients undergoing such procedures as part of their medical care.This highlights the critical need for a simplified scoring system, designed to rapidly evaluate the prognosis of older patients, facilitating timely and appropriate medical decisions before endoscopy.
While established scoring systems such as the GBS, RS, and AIMS65 scores have undergone extensive validation and implementation for patient triage in clinical settings, it is crucial to acknowledge that the severity of various acute and chronic conditions might differ in older individuals compared to their younger counterparts [15,16].Thus, it becomes imperative to explore specific risk factors that address the distinct challenges encountered by this particular patient demographic.
Our study specifically targets the 30-day in-hospital mortality rate, identified at 5.1% within our dataset, compared to a broader population rate of 1.37% from 253,947 patients.This focus differs from the AIMS65, RS, and GBS scores, which generally consider overall in-hospital mortality.Concentrating on this short-term outcome allows for a direct evaluation of immediate care effectiveness and risk stratification within a crucial timeframe, offering critical insights for acute care prioritization.It also minimizes biases related to long-term follow-up, such as varying patient adherence and changes in healthcare access or treatment strategies.To overcome the above limitations and predict shortterm outcome, we developed the ABCAP score, a simplified scoring system specifically designed for older patients with UGIB.
Previous studies have primarily relied on multivariable logistic regression to identify risk factors associated with outcomes in older populations with UGIB [17].We employed a combination of traditional methods and innovative techniques, such as StepAIC, LASSO, ENT, RFE, and Best subset selection, to facilitate the selection of key variables during the training and prediction modeling process.
Each variable selection method employed in our study-StepAIC, Best Subset selection, LASSO, ENT, and RFE-offers unique benefits and challenges.For instance, StepAIC often selects a broad set of features, which might not always be ideal, whereas Best Subset selection methodically identifies the most fitting predictor set, albeit with high computational demands and a risk of overfitting with numerous predictors.Our iterative process revealed that while StepAIC, BestSub, and RFE sometimes selected more than 10 variables, the stability of these selections varied, suggesting a risk of overfitting.Conversely, LASSO and ENT demonstrated greater consistency, selecting 6 and 7 variables respectively, showing a notable concurrence in their choices and underscoring their effectiveness in identifying a compact, predictive set of variables.
In our study, 15 variables were identified as potential predictors of in-hospital death through a rigorous selection process.Among these, Age stood out for its nuanced impact in our older population, suggesting that its predictive value might be surpassed by other determinants in individuals over 65 years old.The analysis also underscored the importance of other variables like ICH [18,19], eGFR [20] and Peptic Ulcer Bleeding [21], with renal function indicating significant implications for UGIB outcomes and Peptic Ulcer Bleeding presenting more frequently than variceal bleeding [22].The variability in risk associated with these conditions underscores the heterogeneity of UGIB patient outcomes, necessitating a tailored approach to risk stratification.To identify the most effective predictors, we explored various combinations of these variables, employing a comprehensive methodology to assess their collective impact.
In our analysis, we initially examined age, BMI, and SBB in continuous terms.Given their small coefficients, we opted to categorize these variables to enhance the scoring system's interpretability and practicality.This categorization not only clarified their impact on outcomes but also aligned the model with clinical guidelines, improving real-world usability.While this transformation may reduce some information, careful cutoff selectionbased on laboratory standards for BUN, Albumin, and Pulse-ensured our model's clinical relevance and consistency with existing practices.
Ultimately, we arrived at a subset of five variables -Albumin, BUN, Cancer, Altered Mental Status, and Pulse -which were consistently selected by all five methods as well as our scoring system.This specific subset was established manually, and we are eager to evaluate its performance.
Traditional regression models have a well-established history of application and validation across various studies, leading to the development of widely used scoring systems.Notably, GBS and RS rely on logistic regression and forward stepwise techniques, respectively, while AIMS65 employs the recursive partition approach, a more recent decision tree method.Recent years, have seen the rise of machine learning techniques-RF, KNN, and SVM-enhancing prediction in areas ranging from bleeding events in valve replacement patients [23] to early Alzheimer's detection [24] and medication adherence in chronic conditions [25].While these advanced methods offer robust classification capabilities for categorical data, choosing the right one depends on the study's specific needs and data characteristics.Despite their potential, machine learning approaches come with challenges, including their opaque "black box" nature and the necessity for careful parameter tuning to refine predictions.
In our study, we employed a comprehensive set of predictive modeling methods, including RF, KNN, SVM, and GLM, to conduct prediction and classification tasks.The combinations of RF, KNN, SVM, and GLM demonstrated diverse performance in predicting binary outcomes.Overall, most combinations exhibited strong performance with high accuracy and sensitivity.
Despite dedicated efforts to fine-tune the critical parameters of each machine learning approach, the results showed minimal or marginal improvements.However, it is important to highlight that generalized linear models demonstrated commendable performance and suitability in this context.This pattern led us to hypothesize that the challenges in applying machine learning methods to this specific cohort arise from its unique characteristics.Machine learning methodologies generally shine when dealing with high-dimensional, complex datasets.However, our attempts to use machine learning methods with all variables in model training yielded only marginal improvements in AUC, while complicating the prediction model considerably.
In our comprehensive comparative analysis of the three machine learning methods alongside the GLM-based combinations, with a special focus on the GLM + BestSub combination, a consistent pattern emerged.We observed that this specific combination consistently demonstrated well-balanced performance across a range of evaluation metrics.Notably, even the ABCAP score, which was manually derived from the selection of five variables, displayed slightly lower metric values in comparison to GLM + BestSub.
Our decision to develop the ABCAP score was influenced by several factors, including the need for result interpretability, data availability, domain expertise, practical ease of calculation and application.
When contrasting the ABCAP score with the GBS, RS, and AIMS65 score, there are both shared and distinct variables.For instance, BUN and Pulse, featured in the ABCAP score, are also significant factors in other scoring systems such as the GBS and RS.Additionally, Cancer is present in both the ABCAP score and the RS.The inclusion of Cancer as a predictive factor holds relevance due to its prevalence among our study population, a factor driven by the age-related increase in cancer cases and its substantial impact on prognosis [26].Furthermore, a multicenter study on chronic diseases among older inpatients in China, utilizing our development dataset, revealed that malignancy remains the leading cause of inhospital mortality [27].
Another significant observation within our study pertains to the dominant role of serum albumin levels, rather than HGB levels, at the time of presentation.This discovery aligns with recent research findings and the AIMS65 score, both of which emphasize the critical importance of hypoalbuminemia in forecasting mortality within the context of upper gastrointestinal bleeding and critical illness [28,29].Interestingly, hypoalbuminemia remains absent from the RS and GBS systems, despite its clinical relevance.
Analyzing the disparities between internal and external validation performance requires consideration of the differences in basic characteristics of the study populations.It is important to note that the AIMS65 and ABCAP scores are tailored for distinct patient groups and outcomes.Our focus on the 30-day in-hospital mortality rate diverges from AIMS65, which considers overall inhospital mortality without a specific time frame.This divergence significantly impacts the differences in predictive performance.
During internal validation, the ABCAP score showcased superior predictive capability, reflected by its notably higher AUC value compared to the AIMS65 score.However, in external validation, while both scores experienced a reduction in performance, they remained within acceptable ranges.The ABCAP score continued to exhibit a higher AUC than the AIMS65 score, but the difference was not statistically significant, indicating comparable performance levels in external cohorts.
Given our study's focus on individuals aged 65 and older in China, the AIMS65 score's inherent assignment of a fixed 1-point for age possibly influences its predictive accuracy against the ABCAP, which assigns a minimum value of 0. This insight highlights the ABCAP scoring system's robust performance and adaptability for our demographic, although it does not conclusively outperform AIMS65 in external validation.
We further delved into the patient and death counts for distinct score levels attributed to both scoring systems across the entirety of the development dataset, yielding insightful findings.Particularly noteworthy is the equilibrium observed in cumulative counts for the 1 to 2 point and 3 to 5 point categories.However, a notable divergence becomes evident when accounting for the corresponding death counts and their ratios.Within the context of our study cohort, the 3 to 5 score range of the ABCAP score exhibits a heightened ability to effectively stratify mortality, surpassing the performance of the AIMS65 score.This nuanced comparison underscores the importance of selecting an appropriate risk assessment tool that aligns with the specific characteristics of the patient population, ultimately enhancing clinical decision-making and patient care strategies.
In a cumulative analysis of the corresponding metrics across each score level in the development dataset, an upward trend in both mortality and positive likelihood ratio (PLR) was observed with increasing ABCAP scores.However, this trend was not consistently smooth.Based on the significant increase in mortality and PLR with higher scores, we were able to establish a risk stratification system.Patients with scores ranging from 0 to 2 were categorized as low risk, experiencing a mortality rate lower than 13%.A score of 3 indicated moderate risk, corresponding to a noticeable increase in mortality to 30.4%.For patients scoring 4 or 5, representing high risk, the mortality rate further escalated, ranging from 57.6 to 80%.
This risk stratification framework offers valuable guidance for healthcare providers when managing older patients with UGIB.If a patient's calculated ABCAP score is 3 or higher, timely intervention becomes crucial due to the significantly elevated risk of mortality.Conversely, if the score is below 3, while the mortality rate remains relatively high, the prognosis is generally expected to be more favorable.This risk-based approach facilitates informed decision-making and aids in prioritizing appropriate interventions for optimal patient care.
Variceal UGIB, often linked to liver disease and esophageal or gastric varices, contrasts with nonvariceal UGIB caused by peptic ulcers or Mallory-Weiss tears.Research in UGIB commonly focuses on nonvariceal cases due to the specialized treatment variceal bleeding requires.The differentiation between variceal and nonvariceal UGIB, crucial in many studies, can be complex.For example, one investigation reported higher mortality in nonvariceal bleeding [30], while another found no significant outcome differences between the two groups [22].
In our dataset, differentiating between variceal and nonvariceal bleeding was complicated by sparse varices information, reflecting the older population's lower endoscopy rates.Despite this, our univariable analysis indicated an association between variceal bleeding and prognosis, although it was selected as a variable by only three methods.Crucially, the ABCAP score performed consistently across both variceal and nonvariceal UGIB cases, showing no significant differences in score distribution or predictive accuracy.This underscores the ABCAP score's utility in risk stratification for UGIB, affirming its value as a versatile prognostic tool regardless of variceal status.

Limitation
Our study has provided valuable insights and promising results.However, certain data-related limitations need to be acknowledged.First, the relatively small sample size may limit the generalizability of our findings to a broader population.Additionally, missing data and potential biases in our retrospective study require careful consideration, even though we used imputation methods cautiously.In terms of study design, our adoption of a single-center retrospective design may introduce selection bias and limit the ability to establish causality.Conducting a multicenter prospective study would enhance the robustness of our findings.

Conclusion
In conclusion, our developed ABCAP score, incorporating Altered Mental Status, BUN, Cancer, Albumin and Pulse as key variables, has demonstrated strong predictive performance in assessing the 30-day in-hospital mortality risk for older patients with UGIB.This score matches the predictive capacity of the AIMS65 system in internal validations and maintains consistent, albeit not statistically superior, performance in external validations.Notably, at the score levels of 3 to 5, ABCAP demonstrates a unique advantage in identifying high-risk patients more effectively than AIMS65, underscoring its potential for more nuanced risk stratification.Importantly, the ABCAP score effectively stratifies mortality risk in both variceal and nonvariceal bleeding cases.Nonetheless, to establish its wider applicability and generalizability, further validation studies across diverse healthcare settings and patient populations are imperative.With its potential to provide valuable risk assessment insights, the ABCAP score stands as a promising tool to guide clinicians in making well-informed decisions and prioritizing appropriate interventions during the acute care phase for older UGIB patients.

Fig. 1
Fig. 1 Flow diagram of participants in study

Fig. 2
Fig. 2 Max AUC of each training methods in internal validation

Fig. 3
Fig. 3 AIMS65 score, ABCAP score and original equation performance in internal and external validation

Fig. 4
Fig.4ABCAP score distribution and performance in the variceal and nonvariceal groups of the development dataset

Table 1
Baseline characteristics of patients in the development dataset

Table 2
Variables selection results by different methods in the training dataset

Table 3
Model performance in internal validation ABCAP score level metrics and risk stratificationTable 7 outlines the patient and death counts within the development dataset, categorized according to each score level for both the ABCAP and AIMS65 scoring systems.Notably, the distribution of scores exhibits variations Table 4 Generalized linear models with ABCAP variables in the training dataset

Table 5
Features and points of the ABCAP score and AIMS65 score

Table 6
Characteristics of external validation dataset

Table 7
Score and death counts of ABCAP and AIMS65 in the development dataset

Table 8
Cumulative mean of statistic metrics in total dataset use ABCAP score Sens: sensitivity; Spec: specificity; PPV: positive predictive value; NPV: negative predictive value; PLR: positive likelihood ratio